Mahalanobis distance

It measures how many standard deviations away x is from the mean of .
Alternatively, it measures how far away a point is from a distribution/cluster(and not between two distinct points) like this:

The Mahalanobis distance is unitless, scale-invariant and takes into account the correlations of the data set.
If each of the axes are rescaled to have variance 1, the Mahalanobis distance corresponds to the standard Euclidean distance.

The formula is:

Why don't we use the Euclidean distance?

Recall that the euclidean distance is simply a straight line between two points.


Euclidean distance will work fine as long as the dimensions are equally weighted and are indipendent of each other.


The Euclidean distance between a point and the center of the points (distribution) can give little or misleading information about how close a point really is to the cluster.



In the right image, if you use the euclidean distance, you wouldn't be able to spot the point in pink as an outlier, because it has the same distance as the point in purple, which is actually in the cluster.


So, it cannot be used to really judge how close a point actually is to a distribution of points.


Formula explanation

This is what the Mahalanobis distance actually does:

  1. Gets the vector representing the distance between the mean and the point
  2. Undistorts the space/vector by applying the inverse covariance matrix.
  3. It calculates the euclidean distance, since there is no correlation anymore.

1) - We get our vector

The vector starts from and ends at the point (see vector subtraction):

2) - Inverse Covariance Matrix

As you may already know, if you multiply white data by a covariance matrix :

The white data becomes a point cloud with correlated dimensions:

Why does it transform like that?

Because the unit vectors now become:


Of course, if you multiply a correlated distribution by the inverse of its covariance matrix , it becomes white data again, and there is no covariance anymore.

This is exactly what we are doing, we are undistorting the vector to make it land on a "circle" instead of an ellipse.

Tldr

From this:


To this: